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Abstract 



The story of the Viterbi algorithm (VA) is told from a personal perspective. Applications 
both within and beyond communications are discussed. In brief summary, the VA has proved 
to be an extremely important algorithm in a surprising variety of fields. 



1 Introduction 

Andrew J. Viterbi is rightly celebrated as one of the leading communications engineers and 
theorists of the twentieth century. He has received almost every professional award possible, 
including election not only to the National Academy of Engineering (USA) but also to the 
National Academy of Sciences (USA), where he chairs the Computer Science section. His award 
citations usually cite "invention of the Viterbi algorithm" as his most notable accomplishment. 

On the other hand, Andy would be the first to tell you that other people deserve much of the 
credit for recognizing its theoretical properties and its practical attractiveness, and for extending 
its domain of application. He has often told this story himself (see, e.g., |33jb 

Nevertheless, no one doubts that Andy's awards are entirely deserved, and that their focus on 
the Viterbi algorithm (VA) is appropriate. This article will attempt to explain why, by briefly 
recounting the history of the VA. It is a "personal history," because the story of the VA is so 
intertwined with my own history that I can recount much of it from a personal perspective. 



2 Invention of the Viterbi algorithm 

The Viterbi algorithm was first presented in Andy's famous 1967 paper jHU] to help prove an 
asymptotically optimum upper bound on the error probability of convolutional codes, which had 
previously been derived by Yudkin in the context of sequential decoding [37j. In this paper, the 
VA is presented just as we understand it today. This paper introduces the important concept of 
survivors (a term possibly borrowed from tennis elimination tournaments), and shows that only 
q K survivors need be retained to decode a convolutional code with constraint length K over the 
g-ary field GF{q). Compared to a block code with q K codewords, such a convolutional code is 
shown to have a much better error exponent, particularly near capacity. 



Andy recalls in a 1999 interview [22] that 

"the Viterbi algorithm for convolutional codes . . . came out of my teaching .... I 
found information theory difficult to teach, so I started developing some tools. ... I 
wrote the first paper in March '66, but it wasn't published until April '67. ... At 
one point I was actually discouraged from publishing the algorithm details. Fortu- 
nately, one of the reviewers, Jim Massey, encouraged me to include the algorithm. 
. . . Nobody thought that it had any potential for practical value ..." 

It is clear from the paper that at this point Andy had no idea that the VA was actually 
an optimum (maximum likelihood) decoder, nor that it was potentially practical. Indeed, the 
paper states that "this decoding algorithm is clearly suboptimal," and concludes: "Although 
this algorithm is rendered impractical by the excessive storage requirements, it contributes to a 
general understanding of convolutional codes and sequential decoding through its simplicity of 
mechanization and analysis" j.'SOj . 

3 Discovery that the VA is optimum 

I believe that I received a copy of Andy's paper prior to publication, probably via Jim Massey. 
At that time I was working at Codex Corp., a small start-up company aiming at practical 
applications of convolutional codes. Our primary focus was initially on threshold decoding, 
which was the subject of Jim's doctoral thesis [25> Ji m was a consultant. Subsequently, we 
developed a sequential decoding system [SHI for the Pioneer deep-space satellite program, which 
became the first code in space p]. 

I had been trying to understand why in practice convolutional codes were generally superior 
to block codes, so I studied Andy's paper with great interest. I realized that the path-merging 
property of convolutional codes could be depicted in what I called a trellis diagram, to contrast 
with the then-conventional tree diagram used in the analysis of sequential decoding. It was then 
only a small step to see that the Viterbi algorithm was an exact recursive algorithm for finding 
the shortest path through a trellis, and thus was actually an optimum trellis decoder. I believe 
that at that point I called Andy, and told him that he had been too modest when he asserted 
that the VA was "asymptotically optimum." 

These results were written up in a 1967 technical report |S] for NASA Ames Research Center. 
They were not published in journal form until many years later, in [2j and |10| . 

Shortly afterward, in a paper submitted in May 1968 |23j . Jim Omura observed that the VA was 
simply the standard forward dynamic programming solution to maximum-likelihood decoding 
of a discrete-time, finite-state dynamical system observed in memoryless noise. Beyond proving 
optimality in a different way, he thus made the first connection between the VA and system and 
control theory. It is interesting to speculate whether the history of the VA would have been 
different if it had simply been called "dynamic programming" from the beginning. 

At this point, none of us had recognized that the VA might be practical. Jim's paper concludes: 
". . . the decoding algorithm discussed here grows exponentially in complexity with constraint 

length v and is therefore impractical for large v " More embarrassingly, in a 1970 IEEE 

Spectrum paper [7] describing practical coding schemes for the space channel, I wrote: 
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Sequential decoding [is] the best-performing practical technique known for mem- 
oryless channels like the space channel, and will probably be the general-purpose 
workhorse for these channels in the future .... 

[The Viterbi algorithm] is competitive in performance with sequential decoding for 
moderate error rates, but cannot achieve very low error rates efficiently On the other 
hand, it [is] capable of extremely high speeds (tens of megabits), where sequential 
decoders become uneconomic. It therefore may find application in high-data-rate 
systems with modest error requirements, such as digitized television. 

4 Recognition that the VA is practical 

Andy has always said that Jerry Heller was the first person to realize that the VA might be 
practical. Jerry simulated the performance of short-constraint-length codes at the Jet Propulsion 
Laboratory (JPL) in 1968-69 [EH [II], and found that with only a 64-state code he could obtain 
a sizable coding gain, of the order of 6 dB. 

In 1968, Andy, Irwin Jacobs, and Len Kleinrock incorporated Linkabit Corp. in San Diego as a 
vehicle to pool their consulting efforts and to obtain small government study contracts. All kept 
their jobs as professors. In 1969, Jerry Heller was hired as Linkabit's first full-time employee. 
Linkabit obtained some small Navy and NASA contracts, which enabled the construction of a 
VA prototype in 1969-70. "It was a big monster filling a rack" |22| . 

The first IEEE Communication Theory Workshop in 1970 in St. Petersburg became famous as 
the "coding is dead" workshop, after Ned Weldon and other speakers worried publicly that coding 
theory had come to a dead end. But what I remember best from that session is Irwin Jacobs 
standing up in the back row, flourishing an integrated circuit (a 4-bit shift register, I believe), 
and asserting that this represented the future of coding. He was quite right. (Unfortunately, by 
this time Codex had made a business decision to get out of coding.) 

By 1971, Linkabit had implemented a 2 Mb/s, 64-state Viterbi decoder. In a special issue 
on coding of the IEEE Transactions on Communication Technology in October 1971, 
Heller and Jacobs JH] discuss this decoder and many practical issues in careful detail. They 
compare the VA with sequential decoding, and conclude that the VA will often be preferable 
because it can use quantized soft decisions easily, and is less sensitive to channel and equipment 
variations. In the same issue, Cohen, Heller and Viterbi [3] describe a system using orthogonal 
convolutional codes and the VA for asynchronous multiple-access communications, and Viterbi 
32 introduces generating-function analysis techniques for the VA. 

During the 1970s, through the leadership of Linkabit and JPL, the VA became part of the 
coding standard for deep-space communication, ultimately in a concatenated coding system 
with a Reed-Solomon (RS) outer code. Linkabit developed a relatively inexpensive and flexible 
VA chip, and the VA became a nice little business for Linkabit. It didn't hurt that the inventor 
of the Viterbi algorithm was a Linkabit founder. The VA also began to be incorporated in many 
other communications applications. 

In the early 1990s, JPL built a 2 14 -state "Big Viterbi Decoder" (BVD) with 8192 parallel 
add-compare-select (ACS) units, which operated at a rate of the order of 1 Mb/s As far as 
I know, the BVD remains the biggest Viterbi decoder ever built. 
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When the primary antenna failed to deploy during the Galileo mission in 1992, JPL devised an 
elaborate concatenated coding scheme involving a 2 14 -state rate- 1/4 inner convolutional code and 
a set of variable-strength RS outer codes, and reprogrammed it into the spacecraft computers. 
This scheme was able to operate within about 2 dB of the Shannon limit at a bit error probability 
of less than 10 -6 , which was the world record prior to the advent of turbo codes jSJ. 

5 The VA and intersymbol interference channels 

In the late 1960s, Codex turned its attention to the voiceband modem business. Our first- 
generation product was a single-sideband (SSB) 9600 b/s modem with a so-called Class IV 
or 1 - D 2 "partial response." About 1969, I recognized that the symbol correlation that was 
thus introduced could be exploited by an ad hoc error correction algorithm, which was able to 
improve the noise margin by about 2-3 dB. This little decoder extended the commercial life of 
this marginal-performance modem by perhaps a year or two. 

It took me a while to understand that I had in fact invented a maximum-likelihood sequence 
detector for this modem. Over time, I realized that this was nothing more than the Viterbi 
algorithm again, streamlined for the 1 — D 2 response. This led to a 1972 paper |5j that showed 
that the VA could be used as a maximum-likelihood sequence detector for digital sequences in 
the presence of intersymbol interference (ISI) and AWGN noise. 

Meanwhile, Jim Omura had recognized independently at UCLA that the VA could be used 
on intersymbol interference channels, because of their convolutional character .24.. Indeed, 
a tantalizing hint in this direction appears in a book review by Andy Viterbi in 1970 |31j . 
After visiting UCLA, Hisashi Kobayashi further developed this idea, particularly for practical 
applications in partial response modems and magnetic recording |18l I19j . 

The VA proved to be too complicated for general use as an equalizer on ISI channels. However, 
it stimulated many suboptimal approximations, and analysis of its performance gave bounds on 
the best possible performance of any sequence detector. 

However, the VA did become standard in the related application of high-density magnetic 
recording. In so-called PRML systems ("partial-response equalization with maximum-likelihood 
sequence detection" ) [^j , the magnetic recording channel is first equalized to a simple "partial 
response" such as 1 — D 2 , and the resulting sequence is then detected by the VA, or by a 
simplified version thereof, as Kobayashi had envisioned JH]- Ln retrospect, it seems possible 
that my little SSB modem decoder was the first implementation of such a PRML scheme. 

6 Trellis-coded modulation 

After Gottfried Ungerboeck published his invention of trellis-coded modulation in 1982 |29j . 
the VA became the workhorse decoder for the next several generations of voiceband modems. 
Ungerboeck extended trellis coding to multilevel constellations by constructing trellis codes in 
which each branch of the trellis represents a subset of constellation symbols, rather than a single 
symbol. By clever constellation partitioning and attention to distances between subsets, he was 
able to obtain coding gains in the bandwidth-limited regime comparable to those that can be 
obtained in the power-limited regime. 
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For example, the V.32 modem (1986) used an 8-state trellis code to obtain a coding gain of 
about 3.5 dB, while the later V.34 modem (1994) used 16 to 64-state trellis codes to obtain 
coding gains of 4.0 to 4.5 dB [TT] . 

7 Applications in mobile and broadcast communications 

The mobile communications channel is subject to fading, bursts, and multiuser interference, and 
is a much more difficult medium than the AWGN and linear Gaussian channels discussed above. 
The designers of second-generation (2G) cellular systems used every tool available at the time 
(early 1990s) to provide reliable communication on this difficult channel. 

The CDMA system developed by Qualcomm uses a 2 8 -state, rate- 1/3 convolutional code with 
interleaved 64-orthogonal modulation, and of course a Viterbi decoder. The TDMA system 
developed for GSM uses the VA not only to decode a 16-state, rate-1/2 convolutional code, 
but also for equalization. A soft-output Viterbi algorithm (SOVA) is often used in the latter 
application 

VA decoders are currently used in about one billion cellphones, which is probably the largest 
number in any application. However, the largest current consumer of VA processor cycles is 
probably digital video broadcasting. A recent estimate at Qualcomm is that approximately 10 15 
bits per second are now being decoded by the VA in digital TV sets around the world, every 
second of every day [23] . 

8 General application to hidden Markov models 

In 1973, I wrote a tutorial paper on the Viterbi algorithm for the Proceedings of the IEEE 
[2] that has turned out to be my most cited paper by far. A recent search using Google Scholar 
shows 734 citations, far more than the 181 for my next-most-cited reference. 

One of the main points of that paper was that the VA can be applied to any problem that 
involves detecting the output sequence of a discrete-time, finite-state machine in memoryless 
noise — i.e., to detection and pattern recognition problems involving hidden Markov models 
(HMMs). Of course, decoding of convolutional codes and sequence detection on ISI channels 
were the main applications discussed in that paper. 

During the 70s and 80s, the VA became widely used in a variety of pattern recognition problems 
that could be described by HMMs, particularly for speech recognition; see Here the VA is 
often used as the M-step of an EM algorithm, which also adjusts HMM parameters. 

Indeed, a recent search of IEEE Xplore shows that most current IEEE references to the VA 
occur in such Transactions as Pattern Analysis and Machine Intelligence or Systems, 
Man and Cybernetics, rather than in Communications or Information Theory. It seems 
that everyone in these fields knows how to "Viterbi the data." 

Finally, in the past decade, the VA has become widely used in much more distant fields such 
as computational biology, e.g., to locate genes in DNA sequences. See for example [TH] . with its 
"Viterbi Exon-Intron Locator" (VEIL). 
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9 Related algorithms 



In the past decade, the development of the field of "codes on graphs" and their related decoding 
algorithms has led to a remarkable conceptual unification of a variety of detection and estimation 
algorithms which have been introduced under various names for various applications. 

In his 1996 dissertation, generalizing the earlier work of Gallager an d Tanner [27j, Niclas 
Wiberg [3U ESI developed the generic "sum-product" and "min-sum" decoding algorithms for 
cycle-free graphs which may include both symbol (observable) and state (hidden) variables. For 
trellis graphs, he showed that these reduce to the BCJR algorithm [2] and an algorithm equivalent 
to the Viterbi algorithm, respectively. For capacity-approaching codes such as turbo codes and 
low-density parity-check (LDPC) codes, the sum-product algorithm with an appropriate schedule 
becomes the standard iterative decoding algorithm that is normally used with such codes. 

Later authors (e.g., ^|2U]) have shown that the sum-product algorithm is equivalent to Pearl's 
"belief propagation" algorithm for statistical inference on Bayesian networks; the Baum- Welch 
or "forward-backward" algorithm for inference with hidden Markov models; and the Kalman 
smoother for linear Gaussian state-space models. 

However, it is important to note that the min-sum algorithm is a two-way "backward-forward" 
algorithm. The VA obtains the same result with a "forward-only" algorithm by storing a path 
history with each survivor. Of course, "forward-only" is a key simplification, particularly for 
real-time communications; the min-sum algorithm would never have been adopted in practice 
as widely as the VA has been. 1 

10 Conclusion 

The Viterbi algorithm has been tremendously important in communications. For moderately 
complex (not capacity-approaching) codes, it has proved to yield the best tradeoff between 
performance and complexity both on power-limited channels, such as space channels, and on 
bandwidth-limited channels, such as voiceband telephone lines. In practice, in these regimes 
it has clearly outstripped its earlier rivals, such as sequential decoding and algebraic decoding. 
(However, it seems likely that it will be superseded in many of its principal communications 
applications by capacity- approaching codes with iterative decoding.) 

Moreover, the VA has become a general-purpose algorithm for decoding hidden Markov models 
in a huge variety of applications, from speech recognition to computational biology. 

Andy Viterbi clearly did not envision the full import of the VA when he first introduced it. 
However, he and his colleagues at Linkabit and Qualcomm were largely responsible for making 
it practical, and for driving its widespread adoption in communications. The history might have 
been otherwise, but it wasn't. In actual fact, no one deserves more credit for this tremendously 
important invention than its actual inventor. 

Interestingly, Ungerboeck discovered both the sum-product and the min-sum algorithms for equalization 
applications in his thesis |28) : however, he missed the forward-only version. 
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